Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

نویسندگان

  • Nevin Lianwen Zhang
  • Weihong Zhang
چکیده

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost-discernible POMDPs and propose an anytime algorithm called spaceprogressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almost-discernible POMDPs. We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Evaluation of Sensors' Reliability and Their Tuning for Multisensor Data Fusion within the Transferable Belief Model

On Preference Representation on an Ordinal Scale p. 18 Rule-Based Decision Support in Multicriteria Choice and Ranking p. 29 Propositional Distances and Preference Representation p. 48 Value Iteration over Belief Subspace p. 60 Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs p. 72 Reasoning about Intentions in Uncertain Domains p. 84 Troubleshooting with Simultaneo...

متن کامل

Point-based value iteration: An anytime algorithm for POMDPs

This paper introduces the Point-Based Value Iteration (PBVI) algorithm for POMDP planning. PBVI approximates an exact value iteration solution by selecting a small set of representative belief points, and planning for those only. By using stochastic trajectories to choose belief points, and by maintaining only one value hyperplane per point, it is able to successfully solve large problems, incl...

متن کامل

Anytime Point Based Approximations for Interactive POMDPs

Partially observable Markov decision processes (POMDPs) have been largely accepted as a rich-framework for planning and control problems. In settings where multiple agents interact POMDPs prove to be inadequate. The interactive partially observable Markov decision process (I-POMDP) is a new paradigm that extends POMDPs to multiagent settings. The added complexity of this model due to the modeli...

متن کامل

Value Iteration over Belief Subspace

Partially Observable Markov Decision Processes (POMDPs) provide an elegant framework for AI planning tasks with uncertainties. Value iteration is a well-known algorithm for solving POMDPs. It is notoriously difficult because at each step it needs to account for every belief state in a continuous space. In this paper, we show that value iteration can be conducted over a subset of belief space. T...

متن کامل

Heuristic Search Value Iteration for POMDPs

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI’s soundness an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001